36 research outputs found
A CNN-based Post-Processor for Perceptually-Optimized Immersive Media Compression
In recent years, resolution adaptation based on deep neural networks has
enabled significant performance gains for conventional (2D) video codecs. This
paper investigates the effectiveness of spatial resolution resampling in the
context of immersive content. The proposed approach reduces the spatial
resolution of input multi-view videos before encoding, and reconstructs their
original resolution after decoding. During the up-sampling process, an advanced
CNN model is used to reduce potential re-sampling, compression, and synthesis
artifacts. This work has been fully tested with the TMIV coding standard using
a Versatile Video Coding (VVC) codec. The results demonstrate that the proposed
method achieves a significant rate-quality performance improvement for the
majority of the test sequences, with an average BD-VMAF improvement of 3.07
overall sequences
UGC Quality Assessment: Exploring the Impact of Saliency in Deep Feature-Based Quality Assessment
The volume of User Generated Content (UGC) has increased in recent years. The
challenge with this type of content is assessing its quality. So far, the
state-of-the-art metrics are not exhibiting a very high correlation with
perceptual quality. In this paper, we explore state-of-the-art metrics that
extract/combine natural scene statistics and deep neural network features. We
experiment with these by introducing saliency maps to improve perceptibility.
We train and test our models using public datasets, namely, YouTube-UGC and
KoNViD-1k. Preliminary results indicate that high correlations are achieved by
using only deep features while adding saliency is not always boosting the
performance. Our results and code will be made publicly available to serve as a
benchmark for the research community and can be found on our project page:
https://github.com/xinyiW915/SPIE-2023-Supplementary
Study of Compression Statistics and Prediction of Rate-Distortion Curves for Video Texture
Encoding textural content remains a challenge for current standardised video
codecs. It is therefore beneficial to understand video textures in terms of
both their spatio-temporal characteristics and their encoding statistics in
order to optimize encoding performance. In this paper, we analyse the
spatio-temporal features and statistics of video textures, explore the
rate-quality performance of different texture types and investigate models to
mathematically describe them. For all considered theoretical models, we employ
machine-learning regression to predict the rate-quality curves based solely on
selected spatio-temporal features extracted from uncompressed content. All
experiments were performed on homogeneous video textures to ensure validity of
the observations. The results of the regression indicate that using an
exponential model we can more accurately predict the expected rate-quality
curve (with a mean Bj{\o}ntegaard Delta rate of 0.46% over the considered
dataset) while maintaining a low relative complexity. This is expected to be
adopted by in the loop processes for faster encoding decisions such as
rate-distortion optimisation, adaptive quantization, partitioning, etc.Comment: 17 page
Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming
One of the challenges faced by many video providers is the heterogeneity of
network specifications, user requirements, and content compression performance.
The universal solution of a fixed bitrate ladder is inadequate in ensuring a
high quality of user experience without re-buffering or introducing annoying
compression artifacts. However, a content-tailored solution, based on
extensively encoding across all resolutions and over a wide quality range is
highly expensive in terms of computational, financial, and energy costs.
Inspired by this, we propose an approach that exploits machine learning to
predict a content-optimized bitrate ladder. The method extracts spatio-temporal
features from the uncompressed content, trains machine-learning models to
predict the Pareto front parameters, and, based on that, builds the ladder
within a defined bitrate range. The method has the benefit of significantly
reducing the number of encodes required per sequence. The presented results,
based on 100 HEVC-encoded sequences, demonstrate a reduction in the number of
encodes required when compared to an exhaustive search and an
interpolation-based method, by 89.06% and 61.46%, respectively, at the cost of
an average Bj{\o}ntegaard Delta Rate difference of 1.78% compared to the
exhaustive approach. Finally, a hybrid method is introduced that selects either
the proposed or the interpolation-based method depending on the sequence
features. This results in an overall 83.83% reduction of required encodings at
the cost of an average Bj{\o}ntegaard Delta Rate difference of 1.26%